-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "bundle init" when run from Databricks #1744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This also needs some unit tests. You could use a mocked workspace client and set it in the context for the unit test. You can use t.Setenv
to set the DATABRICKS_RUNTIME_VERSION
env var for the tests.
See TestResolveClusterReference
for a reference of a mocked workplace client.
a46942f
to
20ee23e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this! This was much less involved than I had expected. Just one remaining comment + one nit.
|
||
const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION" | ||
|
||
func RunsOnDatabricks(ctx context.Context) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for a common lib function. I'd expect this to also check for the existence of /Workspace though to avoid false positives. There may be all kinds of reasons why customers set the env var locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to test for /Workspace
as it makes the code using it harder to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, testing is a pain :( We can leave it out, but then we should at least emit a debug log message whenever there's a match. There will be false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is not an issue as long as you don't inline these expectations.
You can store a bool on the context that stores whether or not we're running on DBR (see bundle/context.go
for inspiration). In tests, you mark it as always true or always false depending on what you want to test. For the real CLI run, you run a routine that performs the actual detection and stores it in the context.
For the check itself, it can look at /proc/mounts
for the workspace fuse mount, it can check for /.fuse-mounts
, it can check for /databricks
, and I'm sure there are a couple other stable ways to determine this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this approach in #1889.
Besides the environment variable, it also checks for the presence of a /databricks
directory.
libs/template/renderer.go
Outdated
@@ -313,8 +319,7 @@ func (r *renderer) persistToDisk() error { | |||
_, err := os.Stat(path) | |||
if err == nil { | |||
return fmt.Errorf("failed to initialize template, one or more files already exist: %s", path) | |||
} | |||
if err != nil && !errors.Is(err, fs.ErrNotExist) { | |||
} else if !errors.Is(err, fs.ErrNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: isn't this easier to read without the else if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me other than one comment. Please TAL!
libs/template/file.go
Outdated
if strings.HasPrefix(path, "/Workspace/") && runtime.RunsOnDatabricks(ctx) { | ||
isNotebook, _, _ := notebook.DetectWithContent(path, content) | ||
return isNotebook | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: remove else block
c409696
to
84d1bbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small comments remaining.
FWIW, in hindsight it would have been neater to use a filer for writing everything. Then we could have swapped out the writing filer for the extension-aware workspace filer and things would have worked transparently (and all writes would use the API as opposed to only notebooks).
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data)) | ||
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.ipynb", data)) | ||
|
||
t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use env.Set
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference? I see t.Setenv
used in tests. If I change it to env.Set
then my tests break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With env.Set
it adds the variable to the context and passes it along in the context.
If it broke then either 1) you weren't capturing the returned ctx
, or 2) the function doesn't use env.Get
to access the environment variables.
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data)) | ||
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.ipynb", data)) | ||
|
||
t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With env.Set
it adds the variable to the context and passes it along in the context.
If it broke then either 1) you weren't capturing the returned ctx
, or 2) the function doesn't use env.Get
to access the environment variables.
|
||
const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION" | ||
|
||
func RunsOnDatabricks(ctx context.Context) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is not an issue as long as you don't inline these expectations.
You can store a bool on the context that stores whether or not we're running on DBR (see bundle/context.go
for inspiration). In tests, you mark it as always true or always false depending on what you want to test. For the real CLI run, you run a routine that performs the actual detection and stores it in the context.
For the check itself, it can look at /proc/mounts
for the workspace fuse mount, it can check for /.fuse-mounts
, it can check for /databricks
, and I'm sure there are a couple other stable ways to determine this.
|
||
func RunsOnDatabricks(ctx context.Context) bool { | ||
value, ok := env.Lookup(ctx, envDatabricksRuntimeVersion) | ||
return value != "" && ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need the value to be non-empty, then you don't need to check if it exists (the OK).
You can use _
for unused variables, or use env.Get
directly:
Lines 51 to 56 in a4ba0bb
// Get key from the context or the environment. | |
// Context has precedence. | |
func Get(ctx context.Context, key string) string { | |
v, _ := Lookup(ctx, key) | |
return v | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parking this PR.
Discussed that we should take the import approach for all files.
## Changes Whether or not the CLI is running on DBR can be detected once and stored in the command's context. By storing it in the context, it can easily be mocked for testing. This builds on the simpler approach and conversation in #1744. It unblocks testing of the DBR-specific paths while not compromising on the checks we can perform to test if the CLI is running on DBR. ## Tests * Unit tests for the new `dbr` package * New unit test for the `ConfigureWSFS` mutator
## Changes While working on the v2 of #1744, I found that: * Template initialization first copies built-in templates to a temporary directory before initializing them * Reading a template's contents goes through a `filer.Filer` but is hardcoded to a local one This change updates the interface for reading templates to be `fs.FS`. This is compatible with the `embed.FS` type for the built-in templates, so they no longer have to be copied to a temporary directory before being used. The alternative is to use a `filer.Filer` throughout, but this would have required even more plumbing, and we don't need to _read_ templates, including notebooks, from the workspace filesystem (yet?). As part of making `template.Materialize` take an `fs.FS` argument, the logic to match a given argument to a particular built-in template in the `init` command has moved to sit next to its implementation. ## Tests Existing tests pass.
## Changes When running the CLI on Databricks Runtime (DBR), use the extension-aware filer to write an instantiated template if the instance path is located in the workspace filesystem. Notebooks cannot be written through the workspace filesystem's FUSE mount. As a result, this is the only method for initializing templates that contain notebooks when running the CLI on DBR and writing to the workspace filesystem. Depends on #1910 and #1911. Supersedes #1744. ## Tests * Manually confirmed I can initialize a template with notebooks when running the CLI from the web terminal.
This was superseded by #1912. The new approach is to use a |
Changes
When started from a Databricks cluster use import API when creating notebooks from
bundle init
.Tests
Manually tested by executing
databricks bundle init
from a web terminal on a Databricks cluster.